{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Vessel Counts\n",
    "You should look at the `voyages_routes` table when you want information about where a vessel was _and_ when it was there. This table is useful when you want to find how many vessels were in a specific region of the South China Sea on January 1, 2020.  To do this, you use a simple `count` aggregation after you filter your data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from descarteslabs.vektorius import vector\n",
    "from shapely.geometry import shape\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "aoi = shape({\n",
    "    \"type\": \"Polygon\",\n",
    "    \"coordinates\": [\n",
    "        [\n",
    "            [104.9853515625, 11.049038346537106],\n",
    "            [124.1455078125, 11.049038346537106],\n",
    "            [124.1455078125, 24.766784522874453],\n",
    "            [104.9853515625, 24.766784522874453],\n",
    "            [104.9853515625,11.049038346537106]\n",
    "        ]\n",
    "    ]\n",
    "})\n",
    "route = vector.table(\"voyages_routes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`voyages_routes` is a pretty large table, with over 150 million rows, which means interacting with  it can be cumbersom if you're not careful about how you filter your data.  When working with this table it's especially important to use temporal `Filters` whenever possible, because the `timestamp` column is Clustered, and temporal filters can dramatically reduce the amount of data that needs to be scanned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "67"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# find routes that intersect with the AOI at the given point in time\n",
    "filter_ = (route.timestamp.date() == \"2020-01-01\") & route.position.intersects(aoi)\n",
    "filtered_route = route.filter(filter_)\n",
    "\n",
    "# only grab distinct mmsi -representing the unique vessel identifier\n",
    "filtered_route = filtered_route[filtered_route.mmsi].distinct()\n",
    "\n",
    "# use the `count` aggregation to get the final count\n",
    "# Note that the result is a simple scalar int\n",
    "# because there's only 1 column and 1 row\n",
    "vessel_count = filtered_route.count().execute()\n",
    "vessel_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can make this aggregation a little more interesting by adding in some vessel information to the counts. You first Join vessel information from vessels to the filtered route information you grabbed before.\n",
    "\n",
    "Joins work by specifying a left and right table, a Join type (`inner_join`, `left_join`, `cross_join`, etc), and a Join condition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "vessel = vector.table(\"vessels\")\n",
    "\n",
    "# `filtered_route` is the left table, that we want to inner_join\n",
    "# to `vessel` where the `mmsi` column of both tables match\n",
    "# only rows that appear in both tables will be returned\n",
    "join_condition = vessel.mmsi == filtered_route.mmsi\n",
    "joined = filtered_route.inner_join(vessel, join_condition)\n",
    "\n",
    "# once you make a join, you need to select which columns you want\n",
    "# returned from each table.  This needs to happen immediately after\n",
    "# the join\n",
    "data = joined[filtered_route.mmsi, vessel.ship_type]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you've performed the Join you can construct your aggregation, which uses a more complex form of aggregation that specifies which columns you want to group by, and what aggregation algorithm you want to use. In this case, you want to see how many voyages were made by each `ship_type`, so you need to group by `ship_type` and get a count of each voyage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ship_type</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Cargo</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Wing In Ground</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Other</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        ship_type  count\n",
       "0           Cargo     65\n",
       "1  Wing In Ground      1\n",
       "2           Other      1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# group by class, and specify the aggretation algorithm for the group\n",
    "data = data.group_by(data.ship_type).aggregate(data.mmsi.count())\n",
    "\n",
    "# this time, vessel_count is a `DataFrame` not a `GeoDataFrame`\n",
    "# we have multiple columns and rows, but none of the columns are\n",
    "# a Geospatial data type, so a `GeoDataFrame` isn't necessary\n",
    "vessel_count = data.execute()\n",
    "vessel_count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
